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ABSTRACT 


A  computational  procedure  is  presented  for  the  nonlinear  dynamic  analysis  of  unsymmetric 
structures  on  vector  multiprocessor  systems.  The  procedure  is  based  on  a  novel  hierarchical 
partitioning  strategy  in  which  the  response  of  the  unsymmetric  structure  at  any  time  instant  is 
approximated  by  a  linear  combination  of  symmetric  and  antisymmetric  response  vectors,  each 
obtained  by  using  only  a  fraction  of  the  degrees  of  freedom  of  the  original  finite  element  model. 
The  three  key  elements  of  the  procedure  which  result  in  a  high  degree  of  concurrency  throughout 
the  solution  process  are  (1)  mixed  (or  primitive  variable)  formulation  with  independent  shape 
functions  for  the  different  fields,  (2)  operator  splitting  or  restructuring  of  the  discrete  equations  at 
each  time  step  to  delineate  the  symmetric  and  antisymmetric  vectors  constituting  the  response, 
and  (3)  two-level  iterative  process  for  generating  the  response  of  the  structure.  An  assessment  is 
made  of  the  effectiveness  of  the  procedure  on  the  CRAY  X-MP/4  computers. 

1.  INTRODUCTION 

Dynamic  response  calculations  for  future  aerospace  systems  are  expected  to  require 
processing  rates  far  in  excess  of  the  upper  limit  of  computers  built  around  a  single  processing 
unit.  This  is  because  of  the  complexity  of  these  systems  and  the  high  degree  of  sophistication  of 
the  computational  models  required  for  simulating  their  response  (see,  for  example,  Ref.  1).  The 
introduction  of  novel  forms  of  machine  architecture  has  brightened  the  prospects  for  meeting 
future  large-scale  computational  needs.  Most  of  the  new  machines  achieve  high  performance 
through  vectorization  and/or  parallelism.  These  include  top-of-the-range  supercomputers  such  as 
CRAY  X-MP,  CRAY  2,  and  ETA- 10,  as  well  as  computers  based  on  readily  available  and 
inexpensive  basic  units  such  as  the  hypercubes  and  the  Connection  Machine.  The  characteristics 
of  several  new  machines  are  summarized  in  Refs.  2  to  5. 

Much  work  has  been  devoted  to  the  development  of  vector  and  parallel  numerical  algo¬ 
rithms  for  performing  matrix  operations,  solution  of  algebraic  equations,  and  extraction  of 
eigenvalues  (sep,  for  example,  Refs.  6  to  10).  Also,  a  number  of  special  strategies  have  been 
proposed  for  increasing  the  degree  of  parallelism  and/or  vectorization  in  finite  element  comput¬ 
ations.  These  strategies  include  domain  decomposition  (or  substructuring),  operator  splitting. 


and  element-by-element  techniques.  They  are  related  to  the  "divide  and  conquer"  paradigm 
based  on  breaking  a  large  (complex)  problem  into  a  number  of  smaller  (simpler)  subproblems 
which  may  be  solved  independently  on  distinct  processors.  The  degree  of  independence  of  the 
subproblems  is  a  measure  of  the  effectiveness  of  the  algorithm  since  it  determines  the  amount 
and  frequency  of  communication  and  synchronization. 

Substructuring  techniques  when  combined  with  operator  splitting  and  iterative  procedures 
result  in  computational  strategies  which  are  well-suited  for  parallel  vectorization  (see  Refs.  1 1 
and  12).  However,  for  the  strategy  to  be  effective  the  following  two  conditions  must  be  satisfied: 

1 .  The  partitioning  of  a  discretized  structure  into  substructures  must  achieve  well-balanced 
workload  distribution  among  the  different  processors. 

2.  The  iterative  process  used  to  account  for  the  coupling  between  the  different  substruc¬ 
tures  must  be  rapidly  converging.  This,  in  turn,  requires  the  substructures  not  to  be  strongly 
coupled. 

The  two  conditions  are  difficult  to  satisfy,  and  to  date,  despite  a  number  of  attempts  to 
develop  partitioning  strategies  (see,  for  example,  Refs.  13  and  14),  no  general  strategy  exists 
which  satisfies  the  aforementioned  two  conditions.  The  present  study  focuses  on  this  problem. 
Specifically,  the  objective  of  this  paper  is  to  present  an  effective  hierarchical  partitioning  strategy 
for  use  in  large-scale  nonlinear  finite  element  dynamic  analysis  on  multiprocessor  computers.  To 
sharpen  the  focus  of  the  study,  multiprocessor  computers  with  a  small  number  of  powerful 
processors  such  as  the  CRAY  X-MP/4  are  considered. 

2.  MATHEMATICAL  FORMULATION 

The  analytical  formulation  is  based  on  a  moderate-rotation  geometrically  nonlinear  theory 
of  the  structure  (see  Ref.  15).  All  the  dissipative  fo-ces  are  neglected.  A  mixed  formulation  is 
used  in  which  the  fundamental  unknowns  consist  of  ?  internal  forces  (or  stress  resultants),  the 
generalized  displacements,  and  the  velocity  components  of  the  structure.  A  total  Lagrangian 
description  of  the  deformation  of  the  structure  is  used,  in  which  the  configurations  of  the  struc¬ 
ture  at  different  times  are  referred  to  the  initial  coordinate  system  of  the  undeformed  structure. 

The  spatial  discretization  is  performed  by  dividing  the  structure  into  finite  elements.  The 


degree  of  the  polynomial  shape  functions  used  for  approximating  the  generalized  displacements 
and  velocity  components  differs  from  the  degree  of  the  polynomials  used  in  approximating  the 
stress  resultants.  Moreover,  continuity  of  stress  resultants  is  not  imposed  at  interelement  bound¬ 
aries.  The  semidiscrete  system  of  equations  governing  the  dynamic  response  of  the  structure 
consists  of  the  following  three  sets  of  relations: 

Constitutive  relations : 

|F|{H}  =  [SJ  { X }  +  {G(X)J  (1) 

Equations  of  motion: 

IM|{V(  ={P[-[S|t(H} -{N(H,X)>  (2) 

Relations  between  velocity  and  displacement  components: 

{X}={V}  (3) 

where  {X}  and  {V}  are  the  vectors  of  unknown  nodal  displacements  and  velocities,  respectively; 
(H)  is  the  vector  of  stress-resultant  parameters;  [FJ  is  a  block  diagonal  matrix  of  element 
flexibilities  (with  each  diagonal  block  associated  with  the  flexibility  matrix  of  a  single  element); 
|S|  is  the  linear  strain-displacement  matrix;  { P}  is  the  vector  of  external  forces;  (MJ  is  a  banded 
matrix  of  the  consistent  mass  coefficients:  {G(X)}  and  {N(H,X)J  are  the  vectors  of  nonlinear 
terms;  superscript  T  denotes  transposition;  and  a  dot  over  the  symbol  indicates  a  derivative  with 
respect  to  time.  In  the  moderate-rotation  theory  used  herein,  the  vector  {G(X)j  is  quadratic  in 
(X};and  the  vector  (N(H,X))  is  bilinear  in  {H }  and  (X). 

The  temporal  integration  is  performed  by  using  an  implicit,  multistep,  one-derivative 
scheme.  The  resulting  nonlinear  simultaneous  algebraic  equations  at  each  time  step  are  solved 
by  using  a  Newton-Raphson  iterative  technique.  The  iterative  process  is  represented  by  the 
following  equations  for  the  stress  parameters  and  velocity  components  of  the  rth  iteration  cycle 
(se'e  Appendix  A): 


|K]f>  {Art!”  =  {Q}Sr) 

(4) 

2 

II 

-* 

+ 

> 

< 

.  — t 

(5) 

where  [K|jr)  is  a  global  structure  matrix  which  includes  the  flexibility  [FJ,  linear  strain  displace- 


ment  [S],  and  nonlinear  contributions  (G(X)}  and  (N(H,X)}  from  different  elements  (see 

Appendix  A);  {Q}-r)  is  the  global  right-hand  side  vector,  and  {\|/}jr+1)  =■!  I  are  the  stress 

parameters  and  velocity  components  at  the  end  of  the  rth  iteration  cycle  at  time  i.  The  procedure 
for  evaluating  the  corresponding  nodal  displacements,  {X}jr+1),  is  outlined  in  Appendix  A. 

3.  BASIC  IDEA  AND  KEY  ELEMENTS 
OF  THE  PROPOSED  PROCEDURE 

The  proposed  procedure  is  based  on  approximating  the  response  of  the  unsymmetric 
structure,  at  any  time  instant,  by  a  linear  combination  of  symmetric  and  antisymmetric  response 
vectors  (or  modes).  Each  of  these  modes  is  generated  with  the  use  of  a  reduced-size  model  of  the 
original  structure.  The  size  of  the  reduced  model  depends  on  the  number  of  symmetry  operations 
used. 

The  model-size  reduction  process  is  depicted  in  Figs.  1  and  2.  It  can  be  thought  of  as  a 
hierarchical  partitioning  strategy,  in  which  the  original  structure  is  replaced  by  an  equivalent 
symmetrized  structure,  having  the  same  symmetric/antisymmetric  components  of  the  response. 

The  top  left  sketch  in  Fig.  1  shows  the  original  unsymmetric  finite  element  model.  The 
response  vector  { V }  is  decomposed  into  symmetric  and  antisymmetric  vectors  { V  }s  and  { V }., 
with  respect  to  line  ab  (see  Fig.  2).  Each  pair  of  complementary  elements  (with  respect  to  ab)  is 
replaced  by  a  single  effective  element  in  the  symmetrized  structure.  Since  each  of  the  symmetric 
and  antisymmetric  components  of  the  response  vector  is  generated  by  using  only  one  half  the 
symmetrized  structure,  the  decomposition  process  amounts  to  partitioning  the  original  structure 
into  two  substructures. 

The  process  is  repeated  to  effect  further  reduction  in  the  size  of  the  substructures  by 
decomposing  each  of  the  vectors  { V}s  and  {V}a  into  symmetric/antisymmetric  subvectors  with 

=  at  ~  ~ 

respect  to  line  cd  in  Fig.  1.  Each  of  the  resulting  four  vectors  {V},{V},{V},  and  {V}  can  be 

generated  by  using  only  one  quadrant  of  the  symmetrized  structure  (shown  in  the  bottom  right 
sketch  of  Fig.  1).  Henceforth,  a  bar  (-)  and  a  tilde  (~)  over  a  vector  refer  to  the  symmetric  and 
antisymmetric  components  of  the  vector,  respectively. 


If  the  fundamental  unknowns  are  selected  to  be  the  symmetric/antisymmetric  components 
of  the  response  vector,  the  governing  equations  in  these  unknowns  can  be  obtained  by  using 
simple  matrix  transformations  on  pairs  of  complementary  elements  (see  Appendix  B).  For 
symmetric  structures,  the  governing  equations  in  the  symmetric/antisymmetric  components  of  the 
response  vector  are  uncoupled.  On  the  other  hand,  for  unsymmetric  structures,  the  equations  are 
coupled.  In  the  proposed  procedure,  the  uncoupled  equations  are  used  as  a  preconditioner,  and 
the  coupling  is  accounted  for  by  using  an  iterative  process. 

The  three  key  elements  of  the  proposed  procedure  are  (1)  mixed  (primitive  variable) 
formulation  with  independent  shape  functions  for  the  stress  resultants,  generalized  displace¬ 
ments,  and  velocity  components,  and  with  the  stress  resultants  allowed  to  be  discontinuous  at 
interelement  boundaries;  (2)  operator  splitting  or  restructuring  of  the  discrete  equations  of  the 
structure  to  delineate  the  contributions  to  the  symmetric  and  antisymmetric  vectors  constituting 
the  response;  and  (3)  two-level  iterative  process  (with  nested  iteration  loops)  to  generate  the 
symmetric  and  antisymmetric  components  of  the  response  vectors  at  each  time  step.  The  top- 
and  bottom-lf’vel  iterations  (outer  and  inner  iteration  loops)  are  performed  by  using  Newton- 
Raphson  and  preconditioned  conjugate  gradient  (PCG)  techniques,  respectively. 

4.  COMPUTATIONAL  PROCEDURE 

A  flow  chart  of  the  computational  procedure  is  given  in  Table  1.  The  application  of  the 
proposed  procedure  to  structural  dynamics  problems  can  be  conveniently  divided  into  two 
phases,  each  of  which  is  well-suited  for  parallel  processing: 

1 .  Preprocessing  phase  in  which  the  finite  element  model  of  the  entire  structure  is  used. 
The  major  steps  in  this  phase  are  hierarchical  partitioning  of  the  response  vectors,  introducing  the 
various  transformation  matrices  (see  Appendix  B  and  Ref.  16),  application  of  operator  splitting, 
and  generation  of  the  various  arrays  for  the  reduced-size  model  through  successive  use  of  pairs 
of  complementary  elements  from  the  model  (see  Fig.  1).  The  number  of  degrees  of  freedom  in 
the  reduced-size  model  is  approximately  1/m  that  of  the  original  finite  element  model,  where  m  is 
an  integer  which  depends  on  the  number  of  symmetry  operations  used.  For  the  sake  of  load 
balancing,  it  is  desirable  to  select  the  symmetries  such  that  m  equals  the  number  of  processors 


(i.e.,  4  for  the  CRAY  X-MP/4). 

2.  Solution  phase  in  which  only  the  reduced-size  model  is  used  in  generating  the  transient 
response  of  the  structure.  The  fundamental  unknowns  in  this  phase  are  the  velocity  components, 
stress  resultants,  and  generalized  displacements  of  one  of  the  substructures. 

A  two-level  iterative  process  (with  two  nested  iteration  loops)  is  used  to  generate  the 
symmetric/antisymmetric  components  of  the  response  vectors  at  each  time  step.  The  top-level 
iteration  (outer  iteration  loop)  is  the  Newton-Raphson  iteration,  and  the  bottom-level  iteration 
(inner  iteration  loop)  is  the  PCG  iteration.  The  details  of  the  PCG  iterations  are  given  in  Appen¬ 
dix  C. 

The  following  comments  regarding  the  procedure  seem  to  be  in  order: 

1 .  The  decomposition  of  the  response  vectors  into  symmetric  and  antisymmetric  compo¬ 
nents  can  be  performed  by  means  of  matrix  transformations.  For  example,  the  symmetric  and 
antisymmetric  components  of  the  velocity  vector  can  be  expressed  as  follows  (see  Fig.  2): 


<V}s=[Tv],W 

{V|„=[Tv]2{V}s 

m„=[Tv]2(V)„ 


m,=[Tv],{V} 

m,.„=nv]2{v}s 

mM=iTV]2{v}, 


(6) 


The  symmetry  transformations  and  the  explicit  form  of  the  transformation  matrices 
[Ty]  ard  [Tyj  are  given  in  Appendix  B. 

2.  The  aforementioned  transformations  are  used  in  conjunction  with  the  operator  splitting 
technique  to  delineate  the  different  symmetric  and  antisymmetric  components  of  the  response. 
For  the  case  of  one  symmetric  operation  (ni=2),  Eqs.  4  can  be  cast  as: 

[[K]+Xo[fc|]{Ay>={Q}+MQ}  (7) 

where  the  subscript  i  and  the  superscript  (r)  are  dropped  for  convenience.  Similarly,  for  the  case 
of  two  symmetry  operations  (m=4),  Eqs.  4  can  be  cast  as: 

||K)  +  A,  |K|  +  A,,  (lie)  +  A||K|)|(A\|/}  =  {Q(  +?,  {Q[  +Xo  ({Of  +MQ»  <*> 
Tracing  parameters  An  and  A,  are  introduced  in  Eqs.  7  and  8  to  identify  the  cotrection 


terms  to  the  symmetric  response  vectors.  The  case  Xo=0  in  Eqs.  7  corresponds  to  a  symmetric 
response  with  respect  to  ab  (see  Fig.  1),  and  the  case  Xo=A,j  =0  in  Eqs.  8  corresponds  to  a  sym¬ 
metric  response  with  respect  to  both  cb  and  cd.  For  A.o=A,t=l,  Eqs.  7  and  8  are  equivalent  to 
Eqs.  4.  Therefore,  and  are  tracing  parameters  which  identify  all  the  coupling  terms  to  the 
symmetric/antisymmetric  components  of  the  response  vectors.  The  explicit  forms  of  the  arrays 
[Kl ,  |K| ,  {Q} ,  and  {Q}  in  Eqs.  7  are  given  in  Appendix  B  (Eqs.  B22  to  B25).  The  vector 
(Aip)  refers  to  the  increments  of  the  symmetric  and  the  antisymmetric  components  of  the  re¬ 
sponse  vector.  As  shown  in  Appendix  B,  the  equations  associated  with  each  of  the  symmetric 
and  antisymmetric  components  of  the  response  vector  are  the  same,  except  for  the  terms  associ¬ 
ated  with  the  unknowns  at  the  interfaces.  Therefore,  only  one  set  of  these  equations  needs  to  be 
considered.  The  sizes  of  the  arrays  in  Eqs.  7  are  approximately  1/2  in  the  original  equations 
(Eqs.  4).  Similarly,  the  sizes  of  the  arrays  in  Eqs.  8  are  approximately  1/4  the  size  of  the  cor¬ 
responding  ones  in  Eqs.  4. 

3.  Equations  8  (or  7)  are  solved  by  using  the  preconditioned  conjugate  gradient  (PCG) 

technique.  The  matrix  [K]  (or  ( K  J)  is  selected  as  the  preconditioning  matrix.  For  the  case  ni=4, 

the  preconditioning  matrix  needs  to  be  decomposed  four  times  corresponding  to  the  four  different 
combinations  of  symmetry  and  antisymmetry  conditions  along  cb  and  cd  (see  Fig.  1).  The  four 
decompositions  are  done  concurrently  on  different  processors  (whenever  possible). 

4.  The  preconditioned  residuals  in  the  PCG  technique  have  regular,  predetermined  pat¬ 
terns  of  symmetry  and  antisymmetry  with  respect  to  cb  and  cd.  These  patterns  are  exploited  in 
the  solution  process  in  reducing  the  amount  of  computations,  as  well  as  in  balancing  the  comput¬ 
ational  load  between  processors. 

For  the  case  m=2,  the  procedure  for  generating  the  matrices  [K] ,  (K| ,  {Q},  and  {Q}  is 

outlined  in  Appendix  B  and  Ref.  16.  The  matrices  for  the  case  m=4  are  obtained  by  repeating  the 
same  process  as  that  used  for  m=2. 

5.  IMPLEMENTATION  ON  CRAY  X-MP  COMPUTERS 

The  proposed  computational  procedure  based  on  hierarchical  partitioning  has  been  imple- 


mented  on  the  CRAY  X-MP/4  computer  system  at  Cray  Research,  Inc.,  Mendota  Heights, 
Minnesota.  The  major  characteristics  of  the  computer  system  are  summarized  in  Appendix  D. 
The  program  was  coded  in  Fortran  using  Cray  Fortran  (CFT)  compiler  directives  for 
vectorization. 

To  improve  the  performance  an  attempt  was  made  to  maximize  the  use  of  the  chaining 
facility  of  the  CRAY  computer  (overlapping  vector  multiply,  add,  and  memory  access)  whenever 
possible  (steps  5,  6,  7,  8  and  1 1  of  Table  I).  This  was  accomplished  by  calling  the  SC1L1B 
routines  for  performing  the  following  six  matrix/vector  operations: 


SMXPY: 

{ Y }  +  [MJ  { X } 

SXMPY: 

{Y}  +  |  M  | 1  { X } 

MXV: 

IMKX] 

MXVA: 

IM]T{X} 

MXM: 

|M][A| 

MXMA: 

I  Ml 1 1  A| 

wherelM],  [A  |  are  matrices;  {X},  { Y }  are  vectors;  and  superscript  T  denotes  transposition. 

The  decomposition  and  back  substitution  in  steps  10  and  1 1  (Table  I)  were  performed  by 
using  optimized  versions  of  the  L1NPACK  routines  SPBFA  and  SPBSL  which  make  extensive 
use  of  the  chaining  facility  of  the  CRAY  computer  and  have  their  inner-most  loops  coded  in 
assembly  language  (see  Refs.  17  and  18). 

Multiprocessing  was  achieved  by  using  the  microtasking  facility  of  the  CRAY  X-MP 
computer.  This  was  accomplished  by  placing  special  directives  in  the  code,  then  passing  the 
code  through  a  utility  preprocessor  called  PREMULT  to  interpret  these  directives,  and  inserting 
the  appropriate  microtasking  code.  The  output  from  PREMULT  is  then  passed  through  the  CFT 
compiler,  The  microtasking  directives  select  the  maximum  number  of  CPU’s  to  be  used,  identify 
the  routines  and  DO  loops  to  be  niicrotasked,  and  identify  the  synchronization  points  in  the  code. 
The  details  of  these  directives  are  given  in  Refs.  19,  20  and  21.  In  the  proposed  procedure, 
microtasking  was  used  in  steps  2,  5,  6.  7,  8,  10  and  11  of  Table  I.  Synchronization  was  used  in 
steps  1 1  and  12.  To  increase  the  speedup  gain  from  multiprocessing,  microtasking  was  done  on 
the  outerloops  of  each  of  the  aforementioned  steps  in  Table  I,  and  therefore,  the  full  arrays 


(elemental  and  structure  arrays)  were  processed  by  each  CPU,  rather  than  dividing  each  of  these 
arrays  between  the  different  CPU’s. 

6.  NUMERICAL  STUDIES  AND  PERFORMANCE  EVALUATION 
OF  PROPOSED  PROCEDURE 

To  assess  the  effectiveness  of  the  proposed  computational  procedure  and  partitioning 
strategy,  a  number  of  dynamic  problems  of  unsymmetric  structures  have  been  solved  by  using 
this  procedure  on  two  CRAY  X-MP/4  computer  systems  at  CRAY  Research,  Inc.,  Mendota 
Heights,  Minnesota  -  namely,  an  X-MP/48  and  an  X-MP/4 16  (see  Appendix  D  and  Table  II  for  a 
summary  of  the  characteristics  of  these  machines).  For  each  problem,  the  accuracy  and  comput¬ 
ational  time  required  by  the  proposed  procedure  were  compared  with  those  of  the  direct  analysis 
of  the  (full)  structure.  Herein,  the  results  are  presented  for  a  typical  problem  of  a  laminated 
anisotropic  cylindrical  panel  with  an  off-center  circular  cutout.  The  loading  is  assumed  to  be 
uniformly  distributed  and  normal  to  the  panel  surface  and  to  have  a  step  variation  in  time.  The 
panel  is  made  of  graphite-epoxy  material.  The  material  and  geometric  characteristics  of  the 
panel  are  given  in  Fig.  3. 

A  total  of  192  mixed  elements  were  used  for  the  spatial  discretization  of  the  panel.  Bi¬ 
quadratic  shape  functions  were  used  for  approximating  each  of  the  generalized  displacements, 
and  bilinear  shape  functions  were  used  for  approximating  each  of  the  stress  resultants  (a  total  of 
6144  stress-resultant  parameters,  3818  nonzero  displacement  degrees  of  freedom,  and  3818 
nonzero  velocity  components).  The  element  characteristics  are  given  in  Ref.  22. 

6.1  Time  History  Response  of  the  Structure 

The  panel  was  divided  into  four  substructures  and  the  proposed  procedure  was  applied  to 
reduce  the  size  of  the  analysis  model  to  that  of  the  corresponding  doubly  symmetric  structure. 
Reflection  symmetry  (and  antisymmetry)  was  assumed  with  respect  to  each  of  the  interfaces 
between  the  different  substructures.  These  symmetries  (and  antisymmetries)  are  typically 
exhibited  by  the  response  of  orthotropic  panels  with  a  symmetric  geometry  and  symmetric 
boundary  conditions  when  subjected  to  symmetric  (and  antisymmetric)  loading. 

The  temporal  integration  was  performed  with  the  implicit  Park  three-step  method. 
Galerkin’s  method  (Crank-Nicholson  scheme)  and  the  Gear  two-step  method  were  used  to  start 


the  computation  (for  the  first  and  second  time  steps,  respectively).  For  the  linear  case,  the  critical 
time  step  for  numerical  stability  is  Atcr=().l  194  |asec.  The  time  step  selected  in  the  present  study 

was  50  (isec.  Both  the  small  displacement  (linear)  and  the  large  displacement  (geometrically 
nonlinear)  elastic  responses  of  the  panel  were  generated  for  a  duration  of  3  msec  (60  time  steps). 
An  energy  balance  check  (Ref.  23)  was  performed  at  each  time  step  to  ensure  the  accuracy  of  the 
solution.  At  each  time  step,  three  Newton-Raphson  iterations  and  an  average  of  twenty  PCG 
iterations  (per  Newton-Raphson  iteration)  were  required  to  obtain  five  significant  digits  of 
accuracy  of  the  response  quantities.  Solutions  obtained  by  using  the  proposed  strategy  were 
compared  with  the  corresponding  solutions  obtained  by  analyzing  the  full  panel  (using  the  same 
temporal  integration  scheme,  same  time  step,  and  same  number  of  Newton-Raphson  iterations  at 
each  time  step). 

The  time  histories  of  the  normal  displacement  component  wa  and  the  normal  velocity 

component  wa  at  point  a,  the  total  complementary  strain  energy  Uc,  and  the  total  kinetic  energy 
* 

K  are  shown  in  Fig.  4.  Point  a  corresponds  to  a  midside  node  which  exhibited  the  maximum 
absolute  value  for  the  normal  displacement  component  during  the  transient  analysis.  Normalized 
contour  plots  for  the  generalized  displacements  and  velocity  components  at  t=3.()  msec  of  both 
the  linear  and  nonlinear  responses  are  shown  in  Fig.  5.  Each  generalized  displacement  and 
velocity  component  is  normalized  by  its  maximum  absolute  value  as  given  in  Table  111. 

6.2  Efficiency  Gain  Obtained  by  Using  the  Proposed  Procedure 

To  assess  the  efficiency  gain  resulting  from  the  use  of  the  proposed  procedure  and  par¬ 
titioning  strategy,  the  measured  CP  times,  wall-clock  times,  and  processing  rates  obtained  by 
using  the  present  computational  procedure  were  compared  with  those  of  the  direct  analysis  of  the 
panel  (no  partitioning).  The  measured  analysis  times  and  processing  rates  for  the  first  ten  time 
steps  are  given  in  Table  IV  and  in  Figs.  6  to  8.  The  CP  times  and  processing  rates  on  a  single 
CPU  were  obtained  by  using  the  FLOP  TRACE  (on  the  X-MP/48)  and  PERF  TRACE  (on  the 
X-MP/416)  utilities  of  the  CRAY  computer  (see  Ref.  24  and  Appendix  D).  The  wall-clock  times 
for  two  and  four  CPU’s  were  recorded. 

The  application  of  the  partitioning  strategy  outlined  in  section  3  resulted  in  reducing  the 
number  of  nonzero  displacement  and  velocity  degrees  of  freedom  from  3818  to  971  and  the 


stress-resultant  parameters  from  6144  to  1536.  The  semibandwidth  of  the  global  matrices  was 
reduced  from  700  to  315. 

When  a  single  CPU  was  used,  the  measured  wall-clock  time  for  the  direct  analysis  was 
319.7  sec  on  the  X-MP/416.  The  corresponding  wall-clock  time  for  the  proposed  procedure  was 

122.6  sec.  Because  of  extensive  use  of  vectorization  and  chaining  in  the  implementation,  the 
sustained  processing  rate  for  the  direct  analysis  was  1 60  MFLOPS  (Millions  of  FLoating  point 
Operations  Per  Second).  The  corresponding  sustained  rate  for  the  proposed  procedure  was  only 

1 19.6  MFLOPS. 

The  measured  CP  times  and  speed  or  processing  rates  on  a  single  CPU  for  the  different 
modules  of  the  proposed  strategy  are  shown  in  Figs.  6  and  7.  As  expected,  the  majority  of  the 
time,  91.3  sec  (75  percent  of  the  total  time),  is  expended  in  the  decomposition  of  the  four  precon¬ 
ditioning  matrices  |K]  with  different  symmetry/antisymmetry  conditions  at  each  Newton- 

Raphson  iteration  of  each  time  step.  The  total  time  expended  in  the  PCG  iterations  was  1 1  sec  (9 
percent  of  the  total  time).  As  can  be  seen  from  Fig.  7,  the  processing  rates  of  most  of  the 
modules  was  over  100  MFLOPS.  The  analysis  time  for  the  proposed  procedure  on  two  and  four 
CPU’s  were  66.7  and  36.3  sec,  respectively,  on  the  X-MP/416.  The  speedups  gained  by  multi¬ 
processing  are  1.84  and  3.38  on  two  and  four  CPU’s,  respectively. 

The  total  speedup  ST  achieved  by  the  proposed  procedure  is  defined  as  the  ratio  of  the 

wall-clock  time  for  the  direct  analysis  of  the  full  structure  on  a  single  CPU  divided  by  the 
wall-clock  time  for  the  partitioned  structure  on  multiple  CPU’s.  The  values  of  ST  on  the  X-MP/ 

416  were  2.61, 4.79,  and  8.81  for  one,  two  and  four  CPU’s,  respectively. 

The  corresponding  analysis  times,  speedups,  and  values  of  ST  on  the  X-MP/48  are  given  in 

Fig.  8  and  Table  IV.  The  reduction  in  wall-clock  times  and  the  increase  in  speedups  on  the 
X-MP/416  (as  compared  with  those  on  the  X-MP/48)  are  due  primarily  to  the  increased  number 
of  interleaved  memory  banks,  which  results  in  fewer  memory  conflicts.  It  is  anticipated  that  the 
speedup  ratios  can  be  increased  by  further  optimizing  the  Fortran  code  used  in  implementing  the 
computational  procedure  of  the  partitioning  strategy.  The  sustained  processing  rate  would  then 
be  increased  to  that  of  the  direct  analysis  implementation. 


7.  COMMENTS  ON  PROPOSED  COMPUTATIONAL  PROCEDURE 

The  numerical  results  presented  in  the  preceding  section  clearly  demonstrate  the  effective¬ 
ness  of  the  proposed  partitioning  strategy  for  predicting  the  nonlinear  dynamic  response  of 
unsymmetric  structures  on  multiprocessor  computers.  In  particular,  the  following  comments  can 
be  made  regarding  the  key  elements  and  the  procedure  and  its  key  elements: 

1.  Even  though  the  mixed  finite  element  models  used  herein  have  been  shown  to  be 
equivalent  to  reduced  integration  displacement  models  (Refs.  22),  the  use  of  mixed  (primitive 
variable)  formulation  was  found  to  be  essential  for  solving  highly  unsymmetric  problems,  with 
strong  coupling  between  the  symmetric  and  antisymmetric  components  of  the  response.  The 
internal  forces  (or  stress  resultants)  should  only  he  eliminated  after,  not  before,  the  application 
of  operator  splitting. 

2.  The  application  of  operator  splitting  in  conjunction  with  symmetry  transformations 
allows  the  restructuring  of  the  discrete  equations  at  each  time  step  to  delineate  the  symmetric  and 
antisymmetric  vectors  constituting  the  response,  as  well  as  the  coupling  terms  between  these 
vectors. 

3.  The  preconditioned  conjugate  gradient  (PCG)  was  found  to  be  a  highly  efficient  and 
stable  technique  for  handling  highly  unsymmetric  problems. 

4.  The  proposed  computational  procedure  can  be  thought  of  as  generating  the  dynamic 
response  of  an  unsymmetric  structure,  using  small  or  large  perturbations  from  the  response  of  a 
simpler  structure  in  which  the  symmetric/antisymmetric  components  of  the  response  vector,  at 
any  time,  are  uncoupled. 

5.  The  proposed  procedure  can  also  be  used  for  the  efficient  dynamic  analysis  of  unsym¬ 
metric  structures  on  single-processor  computers.  This  is  because  the  uncoupled  equations,  for 
the  symmetric  and  antisymmetric  response  quantities,  are  the  same  except  for  the  terms  associ¬ 
ated  with  the  interface  unknowns  (see,  for  example,  Eqs.  B26  to  B31).  The  efficient  generation 
of  the  symmetric  and  antisymmetric  response  quantities  (Eqs.  C3,  C6  to  C8,  and  C28  to  C31) 
can,  therefore,  be  accomplished  by: 

a)  single  partial  decomposition  of  one  set  of  equations  (e.g.,  the  equations  associated  with 
{ Y}  which  do  not  include  the  symmetry/antisymmetry  conditions  along  the  interfaces); 


b)  incorporation  of  the  symmetry/antisymmetry  conditions,  completing  the  decomposition, 
evaluating  the  right-hand  sides,  and  successively  generating  the  symmetric  and  antisymmetric 
response  vectors. 

6.  As  the  number  of  symmetry  transformations  increases,  the  coupling  between  the 
symmetric/antisymmetric  components  of  the  response  vector  increases  and,  therefore,  the  num¬ 
ber  of  iterations  in  the  PCG  technique  increases.  Consequently,  the  proposed  procedure  may  be 
effective  on  multiprocessor  computers  with  fine  granularity  and  small  local  memories  (e.g.,  the 
hypercubes  and  the  Connection  Machine  of  Thinking  Machines,  Inc.). 

8.  CONCLUDING  REMARKS 

A  computational  procedure  is  presented  for  nonlinear  dynamic  analysis  on  vector  multi¬ 
processor  computers.  The  procedure  is  based  on  a  novel  hierarchical  partitioning  strategy  in 
which  the  response  of  an  unsymmetric  structure  at  any  time  instant  is  approximated  by  a  linear 
combination  of  symmetric  and  antisymmetric  response  vectors  (or  modes),  each  obtained  by 
using  only  a  fraction  of  the  degrees  of  freedom  of  the  original  finite  element  model. 

The  analytical  formulation  is  based  on  a  moderate-rotation  geometrically  nonlinear  theory 
of  the  structure.  A  mixed  formulation  is  used  in  which  the  fundamental  unknowns  consist  of  the 
internal  forces  (or  stress  resultants),  the  generalized  displacements,  and  the  velocity  components 
of  the  structure.  The  spatial  discretization  is  performed  by  using  mixed  finite  element  models 
with  the  stress  resultants  allowed  to  be  discontinuous  at  interelement  boundaries.  An  implicit, 
multistep,  one-derivative  scheme  is  used  for  the  temporal  integration. 

The  three  key  elements  of  the  procedure  which  result  in  a  high  degree  of  concurrency 
throughout  the  solution  process  are  1 )  mixed  (or  primitive  variable)  formulation  with  independ¬ 
ent  shape  functions  for  the  different  fields,  2)  operator  splitting  or  restructuring  of  the  governing 
discrete  equations  at  each  time  step  to  delineate  the  symmetric  and  antisymmetric  vectors  con¬ 
stituting  the  response,  and  3)  two-level  iterative  process  for  generating  the  response  of  the 
structure.  The  top-  and  bottom-level  iterations  (outer  and  inner  iteration  loops)  are  performed  by 
using  Newton-Raphson  and  preconditioned  conjugate  gradient  (PCG)  technique,  respectively. 

The  procedure  has  been  implemented  on  the  CRAY  X-MP/4  computers  at  Cray  Research, 


Inc.,  Mendota  Heights,  Minnesota.  An  assessment  is  made  of  the  speedup  resulting  from  the  use 
of  the  proposed  procedure.  The  wall-clock  times,  central  processing  times,  and  processing  rates 
are  presented  for  a  typical  problem  of  the  dynamic  response  of  a  laminated  anisotropic  cylindri¬ 
cal  panel  with  an  off-center  circular  cutout.  The  loading  is  assumed  to  be  uniformly  distributed 
and  normal  to  the  panel  surface  and  to  have  a  step  variation  in  time.  The  use  of  the  foregoing 
procedure  and  partitioning  strategy  reduces  the  total  computational  time  by  nearly  an  order  of 
magnitude  compared  with  that  required  by  the  direct  analysis  (on  a  single  processor)  for  the  first 
ten  steps. 
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Appendix  A 


Governing  Discrete  Equations  of  the  Structure 

In  the  present  study,  ihe  temporal  integration  is  performed  by  using  an  implicit,  multistep,  one-derivative 
scheme.  For  a  constant  time  step  size  At,  the  linear  multistep  operators  have  the  following  form  (see  refs.  25 
and  26): 


1=0  /=o  \al't -i 


A«ft  (£)(  -  «•, + E  -  m-,  (f  )f_ 


where  ^  refers  to  any  of  the  fundamental  unknowns;  m  is  the  number  of  steps;  and  a  and  0  are  coefficients 
associated  with  the  various  difference  operators.  The  values  of  these  coefficients  for  the  three  operators 
considered  in  the  present  study  are  given  in  table  AI. 


Table  AI.  Coefficients  in  Multistep  Operators 


Gear 

Galerkin’s  two-step 


Park 

three-step 

method 


For  the  application  of  the  multistep  integration  schemes,  it  is  convenient  to  transform  the  governing 
equations  of  the  structure  (eqs.  (1)  and  (2))  into  a  system  of  simultaneous  first-order  ordinary  differential 
equations.  This  is  accomplished  by  differentiating  the  constitutive  relations  (eqs.  (1))  with  respect  to  time. 
The  resulting  equations  can  be  written  in  the  form: 

|([F]{ff}-{GW})  =  [5]{K}  (A3) 

If  equation  (A2)  is  used,  the  governing  differential  equations  (eqs.  (2)  and  (A3))  can  be  transformed  into  a 
system  of  nonlinear  simultaneous  algebraic  equations  in  {//}  and  {V}  at  time  t.  Based  on  an  initial  estimate  for 
{X}  and/or  {V},  an  iterative  Newton-Raphson  scheme  can  be  used  for  the  solution  of  the  nonlinear  algebraic 
equations.  The  iterative  process  is  represented  by  the  following  equations  for  the  rth  iteration  cycle: 


['fiSr){A«i,)  =  w)ir) 


where 


(r)=  A^*05)  +[§xf]i) 

Symm  a,(Mj  +  [gjj] . 


-Oi-llF) 

Spmm 


Ai/?,-_,[S]j  r  jn(') 


[A  tfii.i{N(H,X)}^l 


(A7) 


W  =  {"}  (AS) 

If  the  stress-resultant  parameters  are  eliminated,  then  equations  (A4)  can  be  replaced  by  the  following 
equations  in  the  velocity  components: 


where 


[tfv](r){AV}jr)  =  {QV)  Jr) 

=  \Kw\  -  [Khv\T[Khh\~11KHv\ 

{Qv)\r)  =  ~{fv)  !r)  +  \KHv\T{KHH\~l{fH)  ir) 

^vl=cti.1M,  +  (^)!piw 

ai  LaA7Jt 


[Khv]  =  At/?,-  + 

[Khh]  = 


5Gp](r) 


(A10) 

(All) 


(A12) 


(A13) 

(A14) 


Note  that  since  the  stress  resultants  are  allowed  to  be  discontinuous  at  interelement  boundaries,  the  elimination 
can  be  done  on  the  element  level  (i.e.,  before  assembly). 

The  nodal  displacements  {X}jr+1^  at  the  end  of  the  rth  iteration  cycle,  at  time  t,  are  computed  by  using 
the  multistep  formula: 

{X)Sf+1)  =At^-{V}\r+1)  +  +  A</?l_,{V'}t_,)  (A15) 

a%  at  f=  |  \  / 


The  iterative  process  is  continued  until  convergence.  An  assessment  of  the  solution  accuracy  can  be  made 
by  using  an  energy  balance  check  at  the  end  of  each  time  step  and  reducing  the  time  step  when  the  error 
exceeds  a  prescribed  tolerance.  (See  ref.  23.) 


Appendix  B 


Symmetry  Transformations 

For  the  application  of  symmetry  transformations,  the  structure  is  first  divided  into  two  p&  ..itions  (or 
substructures),  m  =  2.  For  convenience,  each  of  the  two  substructures  is  assumed  to  have  the  same  number 
of  elements,  the  same  number  of  nodal  displacements,  and  the  same  number  of  nodal  velocities.  In  addition, 
the  interface  between  the  two  substructures  is  assumed  to  be  along  the  displacement  and  velocity  nodes,  and 
no  stress  parameters  are  associated  with  the  interface.  As  an  aid  to  the  visualization  of  the  symmetric  and 
antisymmetric  vectors,  a  symmetrized  finite  element  grid  is  introduced  which  has  the  same  symmetry  as  that 
of  the  vectors.  The  symmetrized  grid  is  constructed  on  a  symmetric  domain  having  the  same  planform  (and/or 
surface)  area  as  that  of  the  original  unsymmetric  structure.  (See  fig.  1.) 

The  vectors  of  stress  parameters,  nodal  displacements,  nodal  velocities,  and  external  loads  {ff},  {X},  (V), 
and  {(3}  are  partitioned  into  subvectors  {H}i,  (H)2J  (X}i,  {A"}2,  {AT}3;  {V)i,  (V}2,  (V}3;  and  {<3}i,{<3}2. 
{<3)3.  The  subvectors  with  subscript  1  are  associated  with  one  of  the  substructures,  the  subvectors  with 
subscript  2  are  associated  with  the  other  substructure,  and  the  subvectors  with  subscript  3  are  associated  with 
the  interface  between  the  two  substructures. 

The  matrices  [F],  [S],  and  [M]  and  the  vectors  {G(X)},  {N(H,X)},  and  {<3}  used  in  the  discrete  equations 
of  the  structure  (eqs.  (A6)  and  (A7))  are  partitioned  in  submatrix  blocks  as  follows: 

[F)  = 

[S]  = 


[M]  = 


{<?(*)}  = 


{N(H,X)}  = 


{Q}  = 


where  a  dot  in  equations  (Bl)  to  (B3)  refers  to  a  zero  submatrix  (or  subvector).  The  symmetric  and 
antisymmetric  partitions  of  the  vectors  {#},  {X},  (V),  and  (Q)  are  defined  as  follows: 


Fn  ■ 

•  F22 

(Bl) 

Su  •  5,3 

.  •  S22  S23. 

(B2) 

Afn 

M13' 

M22 

M23 

(B3) 

Symm 

M33 

* 

(Gi{XhX3)\ 

lG2(*2,*3)J 

(B4) 

f  NtiHuXuXi) 

{  n2(h2,x2,x3) 

1  (B5) 

[N3{Hi,H2,XltX2,X9)  ) 

fQl) 

{  Q2  > 

(B6) 

[Qz  1 

l».J  “UffJ  \Jhl 

{k*  }  =  [?v]  {vl} 


{£}=»} 


where 


mr]  =  |  [[/]  [/] 


M  -[/] 


[Ta-1  =  [TV]  =  [Tq]  =  |  [[/]  [/] 
{fx)  =  [fv)  =  [fQ]  =  \[[l}  -[/] 


(B13) 


and  the  matrix  [/]  is  the  identity  submatrix.  Again,  a  bar  (“)  and  a  tilde  (~)  over  a  vector  denote  the 
symmetric  and  the  antisymmetric  components  of  that  vector.  In  equations  (B7)  to  (BIO),  the  subscripts  s  and 
a  refer  to  the  symmetric  and  antisymmetric  components,  respectively.  Note  that  {//} 2,  {X}2,  {V}2,  and  (Q) 2 
account  for  the  change  in  sign  of  the  antisymmetric  components  associated  with  the  symmetric  response  (e.g., 
transverse  shear  stress  resultants,  in-plane  displacements,  and  rotation  components  in  a  shell  structure,  see 
ref.  27).  For  symmetric  structures  subjected  to  symmetric  loading  {H} 2  =  {H} i,  {X}2  =  {V}2  =  {^}i, 

and  {Q}2  =  {<3}i. 

The  following  relations  follow  from  equations  (B7)  to  (BI4): 


{8)-™{S( 


=  [Tx\ 


=  [Tv) 


Hs\ 

Haj 

(B15) 

X , 

/ 

*3, 

Xa 

(B16) 

X3  a  - 

1 

V'sa  l 
va  f 

(B17) 

V3J 

where 


fr ,_[(/]  m 

{TH]~m  -{/iJ 
[[/] 

\Tx)  =  \Tv\  =  m 


(BIS) 


1  •  (r»l  (r0]J 

The  vectors  {.^3,},  {V^*}  and  { Xza }>  {V&j}  refer  to  the  symmetric  and  antisymmetric  components  of  {X}s 
and  {V}3;  a  dot  refers  to  zero  submatrix;  (r9)  and  [ro]  are  diagonal  submatrices  with  nonzero  (unit)  entries 
corresponding  to  the  symmetric  and  antisymmetric  components  of  each  vector,  such  that 


(r,l  +  |r„l  =  |/| 


(B20) 


Oh)  =  j 

1 

k  {/« }l  ~  {/h} 2  J 

\  (B33) 

{fv} 1 + {/v  >2 

CM  =  • 

(ra]T{/v}3 

L  j 

(B34) 

II 

'  1 

{fv}  1  “  {fv} 2  1 

(B35) 

.  [r0]T{/v}3  j 

and  Aq  is  a  tracing  parameter  which  identifies  all  the  coupling  terms  to  the  symmetric/antisymmetric 
components  of  the  response  vectors.  The  expressions  for  are  similar  to 


those  of  [S\,  [5],  [M],  and  [M],  respectively. 

The  following  comments  can  be  made  in  connection  with  the  symmetry  transformations: 

1.  When  Ao  =  1,  equations  (B21)  are  equivalent  to  equations  (A4).  The  case  Ao  =  0  corresponds  to 
the  uncoupled  equations  associated  with  the  symmetric/antisymmetric  components  of  the  response  vectors 


{A H}[r)  and  (AV}jr). 

2.  The  presence  of  nonzero  terms  in  the  matrices  with  a  tilde  (")  signifies  the  coupling  between  the  symmetric 
and  antisymmetric  modes  constituting  the_  unsymmetric  response.  For  a  symmetric  structure  with  identical 
substructures,  the  matrices  [F],  [S],  and  [M]  are  zero,  and  the  equations  associated  with  the  symmetric  and 
antisymmetric  partitions  of  the  response  vectors  are  uncoupled. 

3.  For  Ao  =  0,  the  equations  associated  with  the  symmetric  and  antisymmetric  response  quantities  are 
identical  except  for  the  terms  associated  with  the  interface  unknowns  {AV}3*  and  {AV^}^.  Therefore,  only 
one  set  of  these  equations  was  considered,  and  the  symmetry /antisymmetry  conditions  were  applied  at  the 
interfaces. 

4.  The  symmetry  transformations  can  be  applied  in  a  recursive  manner  to  effect  further  reduction  in  the 
size  of  the  partitions  (or  substructures).  To  apply  the  second  symmetry  transformation  (m  =  4),  each  of 
the  response  vectors  {/f}«,  {H}a,  { V}a,  {V}a,  {Af}*,  and  {Af}0  is  decomposed  into  symmetric/antisymmetric 


components  (see  fig.  2)  and  the  associated  matrices  in  equations  (B21),  [if]  and  [K],  are  now  decomposed  into 
[R],  [A],  [A-],  and  \K\.  Also,  the  right-hand  side  vectors  {Q}  and  {Q}  are  decomposed  into  {^},  {Q},  {Q},  and 


{<?}• 


Appendix  C 

Application  of  Preconditioned  Conjugate 
Gradient  (PCG)  Technique 

For  the  case  m  =  4,  the  solution  of  the  linear  alge¬ 
braic  equations  associated  with  each  iteration  cycle  of 
the  Newton-Raphson  technique  (eqs.  (8)),  using  the 
preconditioned  conjugate  gradient  technique,  can  be 
divided  into  four  major  steps  which  are  outlined  sub¬ 
sequently.  For  convenience,  equations  (8)  are  cast  in 
the  following  form  (using  a  single  tracing  parameter 
A): 


The  vectors  {/2}o  refer  to  the  initial  residual 
vector.  Note  that  different  symmetry/antisymmetry 
conditions  are  incorporated  into  the  matrices  [A]  on 
the  left-hand  sides  of  equations  (C6)  to  (C8). 

3.  Initialize  the  symmetric/antisymmetric  compo¬ 
nents  of  the  conjugate  search  direction  vectors: 

{!}o  =  {?}o  =  0  {I}o  =  #}o  (CIO) 

{!}o  =  {y}0  {!}o  =  {?}o(cn) 

Step  2 — Line  search  and  updating  of  solution  and 
residual: 


(A]+A([A]  +  [A1  +  [A)) 


(A^> 


=  {0}  +  A({0}  +  {g}  +  {g})  (ci) 


The  solution  is  sought  for  A  =  1.  The  solution 
vector  in  each  iteration  can  be  decomposed  into  four 
symmetric/antisymmetric  components  as  follows: 


For  j  =  0,  step  j  by  1  until  convergence  and  do 
the  following: 


4.  Compute  the  step  length  ciy  along  the  search 
direction: 


(C12) 


where 


{AtP}  =  {A^}  4-  {A?}  +  {A^}  +  {AV>}  (C2) 

The  four  major  steps  of  the  solution  are 
Step  1 — Initialization: 

1.  Obtain  an  initial  estimate  {A^i}o  of  {A ip}  by 
solving  the  equations: 

{R]{Ai>}o  =  {0}  (C3) 

Note  that  {Atp)o  is  doubly  symmetric;  therefore, 

(A^}o  =  {AxJj}0  (C4) 

and 

{A^}o  =  (A0}o  =  (A?}0  =  0  (C5) 

2.  Obtain  the  symmetric/antisymmetric  compo¬ 
nents  of  the  preconditioned  residuals  by  solving 
the  following  three  equations: 

[£]#}o  =  {0}  -  [A](A^}o  =  (A}0  (C6) 

mY}o  =  {Q}-[K}{Ai>}0  =  {R}0  (C7) 

mho  =  {$}-[Z]{*ho  =  {ho  (cs) 

and 

(?}0  =  {R}0  =  0)  (C9) 


=  {?}j  {%  +  ml  {%  +  {Y}j  {^h 

+{hj{h  (cis) 

6y  =  {2)1  {I}y  +  {2)1  {I}y  +  {Z)l  {f  }y 
+  (C14) 

{I}y  =  [A]{t}y  +  [A]{I}y  +  [f]{I}y 

+  [A]{I)y  (CIS) 

{I}y  =  (A){%  +  [A]{l}y  +  [K]{I}y 

+  lkihj  (Cl6) 

{l}y  =  [A]{Z}y  +  [A]{I}y  +  [K]{|:}y 

+  [fl{I}y  (C17) 

{!}y  =  [A){l}y  +  [A]{I}y  +  [^]{I}y 

+  [Aj{f)y  (C18) 

5.  Update  the  symmetric/antisymmetric  compo¬ 

nents  of  the  stress-resultants  and  velocity  incre¬ 
ments: 

{A%H  =  (A^}y  +  «y{£}y  (C19) 

{A?}y+i  =  (A$}y  +  Ctj{2)j 


(C20) 


(C21) 


Step  4— Computation  of  new  search  direction  vector: 


+  aj{Z}j 

{^}j+ 1  =  {^P)j  +  VjiZ}}  (C22) 

6.  Compute  and  update  the  symmetric/antisym¬ 
metric  components  of  the  new  residual  vector  us¬ 
ing  the  equations: 

{£}i+1  =  -a;{ !};  +  {%  (C23) 

{%H  =  -«,•{%  +  {%  (C24) 

{R}j+ 1  =  -Oj{z}j  +  {R}j  (C25) 
{•^}j+l  =  ~°cj{z}j  +  {^},j  (C26) 

Step  3 — Convergence  check: 

7.  If 

Mj+1  <  «l*lo  (C27) 

where  |.ft|  is  the  Euclidean  norm  of  the  residual 
vector  and  e  is  a  prescribed  tolerance,  then  stop, 
otherwise  continue. 


8.  Solve  for  the  symmetric/antisymmetric  compo¬ 
nents  of  the  preconditioned  residual  vector 


(R|{?)J+1  =  <%+ 1 

(C28) 

l*|{?>;+»  =  {%+i 

(C29) 

[*]{?},+.  =  {S)y+i 

(C30) 

!*K?)j+i  “  {%+i 

(C31) 

9.  Compute  the  orthogonalization  coefficient  / 9J+1 
using 

Pj+l  —  aj+l/aj  (C32) 

where  the  a’s  are  defined  in  equations  (C13). 

10.  Update  the  symmetric/antisymmetric  compo¬ 
nents  of  the  conjugate  search  direction  vectors 

{4}>+1-?y+,{i}i  +  {f}<+1  (C33) 

{%„  =  3,'+1{l>y  +  {?},+,  (C34) 

{Z}j+l  =  Pi+ilh  +  {f)j+ 1  (0») 

{l},+1=ft+1{I>,-  +  {?>.+,  (C36) 


Appendix  D 

Summary  of  Major  Features  of  CRAY 
X-MP  Computers 

The  CRAY  X-MP  computer  system  is  a  multi¬ 
processor  vector  machine  with  shared  central  mem¬ 
ory.  The  two  models  used  in  the  present  study  are 
the  CRAY  X-MP/48  and  X-MP/416  computers  (se¬ 
rial  numbers  231  and  218)  at  Cray  Research,  Inc., 
Mendota  Heights,  Minnesota.  Each  one  of  the  two 
models  has  four  CPU’s  and  an  ECL  bipolar  cen¬ 
tral  memory.  A  summary  of  the  hardware  and  soft¬ 
ware  characteristics  of  the  two  computers  is  given  in 
table  II. 

Each  of  the  CPU’s  has  the  following  features: 

8.5  nsec  clock  period 

14  functional  units 

Scalar  and  vector  instructions 

Four  parallel  memory  ports  connected  to  the 

central  memory:  two  for  vector  and  scalar 

fetches,  one  for  result  store,  and  one  for 

independent  input/output  operations. 

The  estimated  peak  performance  of  each  CPU  is 
210  MFLOPS.  This  can  only  be  achieved  by  using 
the  chaining  facility  of  the  CRAY,  that  is,  by  over¬ 
lapping  the  vector  multiply,  add,  and  memory  access 
operations. 

The  central  memory  has  a  capacity  of  8  Mwords 
on  the  X-MP/48  model  and  16  Mwords  on  the 
X-MP/416  model.  The  memory  is  organized  into  32 
and  64  interleaved  banks  on  the  X-MP/48  and  the 
X-MP/416  models. 

The  two  facilities  for  multiple  CPU  utilization 
on  the  CRAY  X-MP  system  are  macrotasking  and 


microtasking.  (See  ref.  28.)  Macrotasking  is  done 
on  the  FORTRAN  subroutine  level.  User  interaction 
with  the  macrotasking  environment  is  done  via  calls 
to  the  macrotasking  library.  Microtasking  is  done  via 
compiler  directives.  The  granularity  of  microtasking 
is  finer  than  that  of  macrotasking  (i.e.,  down  to  the 
outer  do-loop  level).  The  overhead  associated  with 
microtasking  is  considerably  less  than  that  involved 
in  macrotasking. 

A  number  of  performance  monitors  are  available 
on  the  CRAY  X-MP  system.  The  two  used  in  the 
present  study  are  FLOP  TRACE  on  the  X-MP/48 
and  PERF  TRACE  on  the  X-MP/416.  (See  ref.  24.) 
Each  of  the  two  monitors  produces  a  table  showing 
the  subroutine  name,  the  number  of  times  it  was 
called,  the  execution  time,  the  average  time  per 
call,  the  percentage  of  total  program  time  spent  in 
the  subroutine,  the  accumulated  percentage  of  total 
program  execution  times  for  subroutines  up  to  the 
current  one,  the  number  of  additions,  multiplications, 
and  reciprocals  performed  by  the  subroutine,  the 
total  floating  point  operations,  the  ratio  of  memory 
references  to  floating  point  operations,  the  memory 
reference  rate,  and  the  processing  rate  (in  MFLOPS). 
The  table  is  sorted  in  decreasing  percentage,  so  that 
the  most  important  routines  are  at  the  top  of  the  list. 
The  same  statistics  are  displayed  for  the  program  as 
a  whole. 

Although  these  utilities  are  very  useful  in  identi¬ 
fying  the  time  consuming  subroutines  and  the  candi¬ 
date  routines  for  macrotasking,  they  are  used  with 
one  CPU  only.  To  estimate  the  speedup  result¬ 
ing  from  multiprocessing,  the  only  facility  currently 
available  is  measuring  the  wall-clock  time. 


Table  I.  Flowchart  of  Computational  Procedure 

Preprocessing  Phase 

1.  Input  problem  and  model  data 

2.  Generate  the  linear  modified  elemental  arrays  (see  appendix  B) 

Solution  Phase 

3.  Begin  time  step  loop 

4.  Begin  Newton- Raphson  iteration  loop 

5.  Predict  velocity  components,  stress  resultants,  and  generalized  displacements 

6.  Generate  nonlinear  modified  elemental  arrays  on  the  left-hand  side  of  the  equations  (see  appendix  B) 

7.  Eliminate  stress  resultants  from  elemental  equations  and  assemble  the  left-hand  side  for  the 
reduced-size  model  (see  eqs.  (A9)  to  (A14)) 

8.  Evaluate  right-hand  side  elemental  arrays  and  assemble  them  (see  eqs.  (A7)  and  (All)) 

9.  Begin  PCG  iteration  loop  (see  appendix  C) 

10.  Decompose  the  preconditioning  matrices  associated  with  different  symmetry/antisymmetry 
conditions  (eqs.  (C3),  (C6),  (C7),  and  (C8)) 

11.  Calculate  the  symmetric/antisymmetric  components  of  the  residual  vector,  the  preconditioned 
residuals,  step  length,  orthogonalization  coefficient,  conjugate  search  direction,  and  update  the 
response  vectors  (tasks  numbers  2  through  6  and  8  through  10,  appendix  C) 

12.  Check  convergence  of  PCG,  if  satisfied  continue,  otherwise  go  to  step  11 

13.  Check  convergence  of  Newton-Raphson  iterations,  if  satisfied  continue,  otherwise  go  to  step  6 

14.  Check  if  t  =  <max  then  stop,  otherwise  set  t  =  t  -f  At  and  go  to  step  4 


Table  II.  Characteristics  of  CRAY  X-MP/4  Computer  Systems  Used  in  Present  Study 


Characteristics 

X-MP/48 

X-MP/416 

Number  of  CPU’s . 

4 

4 

Clock  cycle,  nsec . 

8.5 

8.5 

Memory . 

8  Mwords0  (32  banks) 
central  mf.raory 

16  Mwords0  (64  banks) 
central  memory 

Storage  size . 

32  Mwords  SSD6 

512  Mwords  SSD6 

Operating  system . 

COS  1.16  BF2 

UNICOS  3.0.8 

Compiler  . 

CFT  version  1.15  BF2 

CFT  version  1.15  BF2 

Performance  monitor 
(on  one  CPU) . 

FLOP  TRACE 

PERF  TRACE 

°1  Mword  =  1048576  words  (each  word  is  64  bits). 

6The  SSD  (solid-state  storage  device)  functions  as  a  very  high  speed  secondary  memory  and  is 
treated  by  the  operating  system  and  the  user  as  another  disk  drive. 


Table  III.  Maximum  Absolute  Values  of  Generalized  Displacements 
and  Velocity  Components,  {X}  and  {V},  at  t  =  3.0  msec 

Laminated  anisotropic  composite  panel  with  an  off-center 
circular  cutout  subjected  to  uniform  normal  loading 
Po  =  —50000  Pa  (see  figs.  3  and  5) 


Generalized  displacements: 
w 

7T . 

. 

. 

102  x  4>i . 

10  X  <f>2 . 

Velocity  components: 

10~4  x  £ . 

10-2  xfy . 

10"3  x  ^ . 

10“2  x  4>i . 

10-3  X  (f>2 . 


Linear  Nonlinear 

0.2940  1.007 

0.1945  0.3634 

0.7922  1.654 

0.9364  1.825 

0.1862  0.5595 

0.2554  0.1775 

0.6049  0.2036 

0.2848  0.1237 

0.8394  0.4421 

0.1482  0.1708 


Table  IV.  Performance  Evaluation  of  Proposed  Strategy  on  CRAY  X-MP/4  Computers 

Laminated  anisotropic  composite  panel  with  off-center  circular  cutout  subjected 
to  uniform  normal  loading  p0  =  —50000  Pa  (see  figs.  3  and  5) 


X-MP/48 

COS  1.16  BF2  operating 
system,  CFT  compiler 
version  1.15  BF2 

X-MP/416 

UNICOS  3.0.8  operating 
system,  CFT  compiler 
version  1.15  BF2 

Full 

structure 

Partitioned 

structure 

Full 

structure 

Partitioned 

structure 

Wall-clock  time  for  first 

10  time  steps,  sec . 

339 

n 

319.7 

122.6  (1  CPU) 
66.7  (2  CPU’s) 
36.3  (4  CPU’s)  . 

Sustained  speed  on 

one  CPU,  MFLOPS . 

152.1 

119.1 

160 

119.6 

Speedup  due  to 

multiprocessing . 

ifi 

Total  speedup,  S^,  achieved 
by  proposed  procedure . 

1.0 

2.75  (1  CPU) 
4.93  (2  CPU’s) 
8.13  (4  CPU’s) 

1.0 

2.61  (1  CPU) 
4.79  (2  CPU’s) 
8.81  (4  CPU’s) 
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Figure  1.  Original  finite  element  model  and  reduced-size  models  (substructures). 
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Figure  3.  Laminated  anisotropic  composite  panel  with,  off-center  circular  cutout  used  in  present  study. 


Nonlinear 


Figure  4.  Linear  and  nonlinear  dynamic  responses  of  laminated  anisotropic  composite  panel  with  off-center 
circular  cutout  subjected  to  uniform  normal  loading  pa  =  —50000  Pa  (see  fig.  3). 


(b)  Nonlinear  generalized  displacements. 
Figure  5.  Continued. 


(c)  Linear  velocity  components. 
Figure  5.  Continued. 


(d)  Nonlinear  velocity  components. 
Figure  5.  Concluded. 
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during  first  10  time  steps. 
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